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© Computing network with nodes, procedures machines to configure it, and relative uses. 



© A computing network with nodes (SU) is de- 
scribed for "general purpose" uses that does not 
use the traditional Von Neumann architecture. Rath- 
er, it is composed of a set of Similarity Units (SU) 



connectable among themselves so as to allow con- 
figuration of a topology useful for the solution of the 
processing problems under consideration. 
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COMPUTING NETWORK WITH NODES, PROCEDURES, MACHINES TO CONFIGURE IT, AND RELATIVE 

USES 



Background of the Invention 

Almost all the data processing systems avail- 
able today are based on Von Neumann architec- 
ture, which allowed, and still allows, the realization 
of flexible, general use machines and systems. 
Notwithstanding the great developments in pro- 
gramming languages and techniques in the past 
few years, the cost and difficulty of producing 
software is still the greatest limit to efficient use of 
the potential of the computer today. Furthermore, 
an intrinsic limitation of Von Neumann-based prog- 
rammable machines is that to develop a program 
to resolve a problem it is necessary to know how to 
formalize an algorithm. Often, this knowledge is not 
available; in many cases the complexity and the 
amount of resources required render too onereous 
the formalization of necessary algorithms. 

Furthermore, the class of problems with specu- 
lative and applicative interest for which the comput- 
ing power of sequential machines is insufficient, 
becomes constantly greater. This has pushed re- 
search centers and developmental laboratories the 
world over to seek solutions able to progressively 
overcome these limits. In this way numerous stud- 
ies and development ideas were born, with the goal 
of realizing process architecture, that make ever 
more use of parallelism. All the same, such ar- 
chitectures are almost always based on a set of 
Von Neumann machines, organized in the most 
varied ways, operating in parallel 
(contemporaneously). Such solutions, however, 
have proven less efficient than first thought, for the 
following reasons: 

• the programming difficulties already noted are 
not only not reduced, but rather are increased by 
the need to parallelize the processes, keeping a 
needed synchronization, and 

• data transfer between the units operating in par- 
allel creates problems both at hardware (data rate) 
and software (management) levels that don't allow 
users to benefit - if not partially - from parallelism. 

So-called connectionism constitutes a com- 
pletely different approach (Hopfield networks, Bolt- 
zman machines, Kohonen networks, etc.), returned 
to the forefront in the past few years, after about 20 
years of complete abandonment. 

This approach, inspired in part by neural net- 
works, bases the processing scheme on multilayer 
networks of very simple units with threshold-type 
responses. These units are connected among 
themselves so as to provide, after an appropriate 
training phase, the desired response to every input 
stimulus. Even so, the learning works generally by 



varying the "weights" of the connections, rather 
than the transfer functions of the entities found in 
the network nodes. This drastically limits the pro- 
cessing abilities, necessarily leading to high com- 
5 plexity even for simple problems. Furthermore, it 
makes it difficult to develop efficient learning pro- 
cedures with rapid convergence, not to mention the 
requirement of long and costly network reprogram- 
ming. 

10 

Goals and Summary of the Invention 

The invention presented here aims to develop 
75 a methodology and data processing systems able 
to overcome the limits described above. More pre- 
cisely, its goal is to provide processing able to: 

• eliminate the necessity of software development, 
substituting simple and efficient "learning" for pro- 

20 gramming; 

• allow the solution of problems for which there is 
no known algorithmic solution; 

• obtain effective calculation power notably supe- 
rior to that available today. 

25 In terms of synthesis, the solution presented by 

the invention is based on the capacity of defining 
and configuring means able to realize every cal- 
culation scheme and to learn from time to time a 
resolution strategy, without the necessity of having 

30 to know, and much less formalize, an appropriate 
algorithm. Specific"programming" by the user is 
not necessary. The set of machines subject to the 
invention and derivable from it has, furthermore, a 
very high processing ability, and assures execution 

35 times independent of the complexity of the prob- 
lem. These machines allow, among other things, 
configurations obtained at the end of the learning 
phase to be stored in memory, enabling libraries of 
configurations to be formed and easily reloaded, 

40 giving the same flexibility as conventional comput- 
ers. 

More specifically, the invention is based on the 
idea of realizing data processing networks with a 
connection topology in some way analogous to that 

45 of connectionist networks, in which, however, the 
active connections do not have variable weights. 
The network nodes are similarity units (SU) that are 
modified during the learning phase to allow realiza- 
tion of the transfer function requested. The struc- 

50 ture of the SU invented allows any transfer function 
to be realized, a complete departure from the con- 
nectionist example. The learning procedures, also 
objects of the invention, allow the network to be 
trained to solve any data processing problem, with- 
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out requiring programming, as has already been 
mentioned. The invented networks are a model of 
systems able to transform the input data space 
(relative to any physical reality: images to recog- 
nize, sample signals, input data of an algorithm, 
etc.) into the output data space, carrying out suc- 
cessive transformations corresponding to internal 
layers of the network. 

More precisely, from the space of the input 
units, the points of which represent the configura- 
tions of the input data, one passes, by network 
connections, to a more interna! space, where the 
points are redescribed with new components. The 
procedure is thereafter applied to transform this 
new space into a successive space, until reaching 
a space that corresponds to the layer of the output 
units. 

The network training procedures included in 
the invention (covered by the patent) are based on 
hierarchic learning, in which a statistically signifi- 
cant set of input data configurations is provided as 
input for the network and the corresponding de- 
sired configurations (the desired result of the data 
processing) are imposed as output 

Each network node generates, for each distinct 
configuration at its input ports, an input-output code 
by association, using the methods described in the 
following paragraphs. The procedure is propagated 
to the top layer of the network, that is the data 
output layer. For the nodes of that layer the associ- 
ation codes therefore will be those imposed (direct 
learning). 

At the end of this phase, one may proceed to 
the substitution of those codes for which associ- 
ation always produces the same output codes, that 
is, when there is assurance that - for the data 
processing requested - these are equivalent. This 
procedure, which can be defined as back-learning, 
highlights the structures and the internal properties 
of the data treated. 

Each processing problem therefore has its re- 
spective similarity net formed and trained. That 
which is defined for each is: 

• number of layers, 

• number of nodes (SU) in each layer, 

• connections 

• structure of the nodes, both in number and type 
of inputs and outputs, and for the relative transfer 
functions. 

To obtain a general purpose data processing 
system, it is therefore necessary to design and 
realize architectures capable of building any net- 
work without having to construct ad hoc hardware 
each time. The set of such architectures 
(machines), which is also covered by the patent, is 
based on a modular structure of layers. Each layer 
principally consists of: 

• the nodes (SU) of the layer, 



• a connection management system (CMS), 

• a communication bus system, 

• a layer control unit (LCU). 

The CMS for each layer or hierarchic layer of 
5 the network has its inputs connected to the SU 
outputs of the preceeding layer and its outputs to 
the SU of the layer in question. This is essentially a 
switching bank that allows, under control of the 
LCU, realization of all connections wanted. The 
w number of layers and the number of SU per layer 
are completely programmable under control of the 
system control unit (SCU). This unit also manages 
the machine interface, both with the operator and 
standard periferals, and, when requested, process. 

75 

Detailed description 

The proposed invention will now be described, 
20 solely as a non-limiting example, in reference to 
the attached figures, to wit: 

• Figures 1-3 illustrate various of the possi- 
ble ways of actuating a network according to the 
principles described here, 

25 • Figure 4 shows the scheme of principle of 

a machine using the invention, 

• Figure 5 shows the implementation 
scheme of one layer of the machine in Fig. 4 in 
greater detail, 

30 • Figures 6-7 describe the functioning of the 

CMS between successive layers, and 

• Figures 8-10 show the nodes (SU) of the 
network. 

Essentially, a network built according to the 

35 principles described, (similarity net or S-net, label- 
ed 1 in the attached figures) is a hierarchic ar- 
chitecture the nodes (SU) of which are computa- 
tional elements able to realize any transfer function. 
Figure 1 shows, as an example, a single-dimension 

40 network with three layers, the SU of which have 
three input ports and one output port. 

SU are completely defined when their transfer 
function f u is defined. Of course, the f n transfer 
functions theoretically may vary from one layer to 

45 another or even within the same layer. Further- 
more, it is understood that generally the same 
number of layers, the size of the network, and the 
number of input and/or output ports may vary ac- 
cording to the needs specified and the nature of 

50 the problem. There also may be networks with 
input ports common to more than one SU, with 
connections between different layers, feed-back, 
etc. 

In general, then, a network 1 is defined directly 
55 by its topological structure and by the transfer 
functions of the SU of which it is composed. 

Figure 2 instead shows, as a non-limiting ex- 
ample, a four-layer, two-dimensional network 1, 
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consisting of SU with two input ports and one 
output port. Note that the topology of two-dimen- 
sional networks of this type, if necessary with SU 
that have different numbers of input and output 
ports, are particularly suited to the treatment of 
two-dimensional structures and data and for the 
processing and recognition of configurations, for 
example in all problems of Image Processing, 
Scene Analysis and Recognition. Generally in 
these problems all SU of the same layer must/may 
have the same transfer function. This further simpli- 
fies the learning processes and reduces the 
amount of memory allocated to the units of the 
layer. In that case it is possible to organize the SU 
such that they may share a single memory for the 
entire layer, since only one transfer function is 
represented by them. 

Learning methods 

In general these networks have several possi- 
ble modes of functioning, both in the learning and 
processing phases. For descriptive simplicity some 
learning methods covered by the patent will now 
be described for a one-dimensional network with n 
(n^2) layers, with SU having two input ports and 
one output port, as shown in Figure 3. 

1. Deterministic direct learning 

As input to the network, and therefore as input 
IN to the first layer SU, data configurations belong- 
ing to a learning set (LS) are sent sequentially. The 
LS, of course, must be "representative" of the 
universe of data for the processing problem at 
hand. In deterministic learning representativity im- 
plies the prescence in the learning set data of all 
the structures necessary to solve the problem. As 
shown in Fig. 3, the first layer SU assign a code 
(for example from 0 to k-t, if k is the number of 
distinct pairs in LS) to each pair at its input 

In this way all data configurations of the learn- 
ing set LS - in the example all quadruples in LS 
with values Xi^^x* - are codified by the SU. 
This may be implemented advantageously by re- 
quiring each SU to build a transcodification ma trix 
E,(0:k-1 r O:k-1) for each of these. Such matricies 
obviously represent the relative transfer functions 
g h One proceeds therefore to "codify" the LS with 
the codes produced operating at the first network 
layer. The relative outputs constitute the inputs to 
the next layer. For the following layers, up to the 
penultimate, it proceeds as for the first layer, op- 
erating, however, on the output pairs of the 
preceeding layer. Finally, at the last layer, an LS 
output' code is associated to each pair extracted 



from those produced at the preceeding layer. At 
this point the network is trained: the codes not 
assigned correspond to configurations declared 
"non classifiable". Such procedures obviously may 
5 be generalized to SU with any number of input and 
output ports, and to networks with any number of 
layers. 



70 2. Statistical direct teaming 

For a learning set LS, whose representativity 
(this time in the statistical sense) of the universe of 
data is fundamental, a statistic of the frequency of 

75 the pairs of signals in input at the bottom layer of 
the network is executed. (Here "pairs" of signals 
are discussed with the understanding that the net- 
work SU have, as said, two input ports: more 
generally, if h is the number of these ports, it gives 

20 /)-uple ordinates). On this basis then the pairs with 
a total frequency above a previously set threshold 
are extracted and assigned a code (the next avail- 
able, for example from 0 to k-1, where k is the 
number of such pairs). All other pairs are rated 

25 "unclassified". The implementation consists of fill- 
ing a codification matrix E(0:k-1, 0:k-1) as in the 
case described previously. 

This procedure now is applied, as in deter- 
ministic learning, to successive layers up to the top 

30 of the network, where desired codes are imposed 
as output values. When this procedure is finished 
the network is statistically trained. The represen- 
tativity and trustworthiness of the learning are de- 
termined by the frequency threshold mentioned 

35 above. Obviously, the tendency to 1 of these 
thresholds leads statistical learning to coincide with 
deterministic learning. 



40 3. Direct learning based on distance 

This procedure allows more complete network 
training, even in the prescence of a limited LS (as 
long as it remains representative of the statistics of 

45 the input signals). Moreover, this allows a reduction 
in the number of codes used to define the transfer 
functions without similarly reducing the processing 
abilities outside the LS. 

The learning procedure continues in each layer 

50 as in the previously described case, until extraction 
of the k most frequent codes (which we call 
"proper codes") takes place. For all other pairs, 
instead of the code "non classifiable", the code of 
the closest pair with proper code is assigned, ac- 

55 cording to the rule established. In other words, if 
A,B is a pair of layer y'to which a proper code E /+ y- 
(AJB) at layer )+1 has not been assigned, A,B is 
classified with the code E j+1 (X S Y), where X,Y is the 
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pair with proper code for which Dj(A,X) + D/fliY9 is 
minimum (where D,- is the function of the distance 
to layer /, derivable from the distance to layer 0). 

The statistical direct learning procedures based 
on distance are particularly well suited for use in 
approximate calculation or in intrisically redundant 
signal processing. This last, for example, is the 
case for all structured signal processing applica- 
tions, pattern recognition and understanding 
(automatic vision, image processing and recogni- 
tion, speech recognition, written information rec- 
ognition, etc.). 

4, Back-learning methods and procedures 

Back-learning procedures allow the learning 
phase to be refined, highlighting the structures 
corresponding to the internal properties of the LS 
data. These base themselves on the idea that if 
pairs (or groups) of SU input codes produce, for a 
trained SU, the same codes upon exiting the SU, 
then the codes of the pair (or the group) are 
equivalent for the processing purposes for which 
the network was trained. 

As a non-limiting example, if the transfer func- 
tion for an SU with two input ports and one output 
port, let's say the >th of the Ath layer Eq(X s Y) t is 
such that, for all C, 

(1) E n (A,C)=E n (B,C) or E il (C t A)=E ii (C i B) t 
i.e. if the input codes A and B always give rise to 
the same codifications, then for the processing 
purposes for which the network was trained A and 
B are equivalent and may therefore be identified. 

For example, if in treating a text lower case "a" 
and upper case "A" lead, when combined with any 
other alphabetic symbol, always to the same 
codes, it means that in processing, the text in 
question may be written indifferently with upper or 
lower case letters. In this case there is a verifica- 
tion of the condition presented in ( 7) and therefore 
upper and lower case "a" or "A" are equivalent 
and may be identified. 

The extension of the condition given in (7) to 
SU with more than two input ports and many ouput 
ports is obvious and therefore will not be explained 
here. 

More generally, for all the conditions for which 
( 1) is an example, that show partial or total equiva- 
lences of groups of codes, indicate the prescence 
of local or global properties of the data relative to 
the processing tasks. Note that, as far as equiv- 
alence is concerned, the codes not assigned in 
learning, if necessary, may be considered indiffer- 
ent. In all these situations it is possible to use 
back-learning that, besides highlighting these prop- 
erties, allows a dimensionality reduction of the tran- 
scodification matrices. This simplifies network im- 



plementation and allows correct processing of data 
outside the LS. 

Three back-learning procedures may be used 
to illustrate this premise, only as an example, to 
s clarify the concept of "substitution" of equivalent 
codes (or groups of codes). 

a) Deterministic back-learning 

10 

Starting from the layer at the top of the network 
(last, or output, layer) it begins to substitute all 
codes that verify type ( 1) equivalence conditions. In 
the example given in (7), B may be substituted by 

75 A, at the SU input of the /-th layer as well as at the 
output of the (/- 7)-th layer. 

Therefore, the procedure here described back 
tracks to the SU at the input layer. In general, this 
procedure may be iterated until equivalent codes 

20 are no longer identified. 



b) Statistical back-learning 

25 Statistical back-learning procedures covered by 

the patent are those procedures that lead to the 
substitution of statistically equivalent codes or 
groups of codes, that is groups of codes for which 
the conditions expressed in ( 7) and the others cited 

30 above are verified non deterministically, but with 
frequency above a predetermined threshold. 

For example, the expression (1) would be 
changed to 

(2) ^ Pf [E u (A,C)*E i} (B,C)<v, or Pr [E fi (C,A)*E n - 

35 (C,B)]<v 2 

where p r is the frequency in the LS and jm, V2 are 
given thresholds. 

The expressions in (2) mean that the codes A 
and B may be considered statistically equivalent if 

40 the probability - or better yet, the frequency - in an 
LS with which different codes are generated is 
below the thresholds vu t> 2 that condition the trust- 
worthiness of the substitution. 

When vi, V2 tend toward 0 conditions (1) and 

45 (2) tend to coincide. Even in this case the statistical 
equivalence is generalized to all types of SU and 
all cases of identical groups of codes, as in the 
preceeding case. 

This back-learning procedure is identical to the 

50 deterministic procedure, except that here statisti- 
cally equivalent codes are substituted. Note that 
every substitution frees at least one code, that may 
be used, in the case in which direct statistical 
learning has been carried out, to substitute an 

55 improper code with a proper code, in this way 
bettering the representativity of the data in the set. 
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c) Distance-Based Back-Learning 

In this procedure the conditions of equivalence, 
or statistical equivalence, are substituted by con- 
ditions of "metric equivalence": codes equivalent in 
measure are those codes that generate codifica- 
tions whose distance, with the measure of the data 
space in use, is below a predetermined threshold. 

Whichever learning scheme is adopted, the 
data are introduced to the input IN of network 1, 
which is already trained: the outputs of each layer 
constitute the inputs to the next. 

The last layer outputs the result of the process- 
ing. A single complete process requires as many 
elementary cycles as there are layers, but uses 
them one at a time, in sequence. Thus, the 
throughput of an entire process is in time equal to 
the duration of the elementary cycle. 

In the processing phase the SU associate, ac- 
cording to the training received in the learning 
phases, the appropriate configuration of the suc- 
cessive input data at every configuration of data 
input. The set of outputs of the SU of one layer is 
the configuration of the input data for the next 
layer. 

Some specific examples of learning procedures 
are presented at the end of the present description. 
Now the structure of such a machine, or of a 
processing circuit able to implement any network 1 
of the type previously described, for the learning 
phase as well, will be described. 

This type of machine (S-machine) is based on 
a hierarchic structure of modular units that allow 
configuration of any type of network 1 . In Fig. 4 an 
example of a typical implementation is given. The 
machine is composed of a number (n in the exam- 
ple) of functionally identical layers, a system con- 
trol unit (SCU) that communicates with all the lay- 
ers through a system bus BUS, and with both 
standard (A) and process (B) periferials. 

Each layer is composed of a connection man- 
agement system CMS (Fig. 5), which actuates the 
connections between SU nodes so as to form the 
network 1 requested. 

In the configuration phase SCU defines the 
number of layers to implement, the connectivity 
and the structure of the SU of all layers, sending 
the relative commands to each CMS. 

In the training phase the LS data are sent as 
input to the first layer (DATA IN), while for output 
(OUT) the desired changes are imposed. In this 
phase the SU are trained with the previously de- 
scribed learning procedures. 

When training is completed, the SU will be 
structured so as to fit each configuration at input 
with the appropriate output configuration, according 
to the procedures described above. If the SU are 
implemented to act as memory units in which 



outputs are tabulated in function of inputs, that are 
therefore interpreted as "addresses" for the tabula- 
tion, then every layer furnishes output in an ele- 
mentary address cycle. The output of each layer 

5 constitutes the input data of the successive layer: 
the entire process of an input data set lasts for as 
many cycles as there are layers, independently of 
the processing complexity. At the end of a cycle 
the first layer is, however, already available for the 

70 next input data set, therefore the effective through- 
put of the machine is of an entire process per 
cycle, independent of the process complexity and 
the number of layers requested. If, for example, 
components with a cycle of 20 nsec are used, the 

15 machine can execute a complete processing of any 
complexity about every 20 nsec. If this corre- 
sponded to 100 Mflop/s, the equivalent capacity of 
the machine becomes equal to 5 GFIop/s. 

The following illustrates all the units that con- 

20 stitute machine 2 in Fig. 4. It is evident that the 
intrinsic modularity of the system allows machines 
to be realized with calculation capacity proportional 
to the hardware realized. For example, a structure 
with only one set of SU and only one CMS allows 

25 networks to be configured with any number of 
layers, even to connect, at cycle end, the outputs 
of the layer to its input and change the SU memo- 
ries. In this case the throughput diminishes with the 
number of layers: more precisely, organs of com- 

30 munication may be planned so that an entire pro- 
cess that requires n layers is executed in n ele- 
mentary address cycles. Each layer of machine 2 
is composed of the set of its SU of that layer, from 
a system of configurations and management of the 

35 CMS connections and gives a layer control unit 
LCU shown in the example in Fig. 5. The LCU 
communicates with the SCU on the system bus 
BUS, with CMS and the SU through another bus L- 
BUS. Obviously, these busses may be substituted 

40 by dedicated or changed lines. 

The LCU sends to CMS the commands to 
actuate the requested connections, according to 
the instructions received from the SCU. It also 
makes provision for structuring the SU with the 

45 number of input ports and output ports requested, 
through the SU-L-BUS connection, consequently to 
organize the internal memory. Moreover, when a 
trained configuration is loaded from the library, 
always through the connection cited, the SU 

50 memories are loaded as well, as will be explained 
in more detail below. 

In the learning phases the SU, which have 
been connected according to the computing topol- 
ogy requested, operate in the previously described 

55 ways, according to the schemes: deterministic, sta- 
tistical or distance based. For the layer with the 
most outputs, LCU provides, through the L-BUS, 
the imposition of the SU out puts. In the back- 
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learning phase LCU controls, through the same 
bus, the substitution of the equivalent codes. 

In the execution phase the layer acts as any 
layer of a non-configurable network (that is with 
cabled "topology"), associating in a cycle at every 
input set the corresponding output set. In that 
phase the LCU can effectuate a diagnostic monitor- 
ing of the functioning of the layer. 

Since one of the characteristics peculiar to 
network 1 is that it highlights the existence of 
properties internal to the data structures, above all 
with the application of the back-learning proce- 
dures, LCU is also used to analyze the contents of 
the SU memories, to which it has access through 
the L-BUS. It sends information about these to the 
SCU, that may in this way communicate, by means 
of the standard periferals A, the structures of the 
primitives and the properties determined. 

Lastly, if one wants to memorize a complete 
and trained configuration of machine 2, LCU sends 
to the SCU all the data about the structuring of the 
SU connections and the contents of their memo- 
ries. In this way the configuration may become part 
of the appropriate library. 

The CMS may be realized with a modular 
structure: in Figs. 6 and 7 the CMS is basically a 
system of programmable change that, according to 
rules set out by the LCU, allows activation of the 
connections requested between SU. These may be 
realized with a modular structure: Fig. 6 shows a 
functional scheme that exemplifies one such mod- 
ule. The switches indicated allow all possible con- 
nections between the module's input ports A and B 
and output ports C and D to be carried out. The 
connections with adjacent modules are assured by 
the connections labeled a, B, g and d. Architec- 
turally, the modules that make up CMS may be 
realized with a multibus structure, either completely 
parallel as in the example in Fig. 7, or by partial (or 
total) sharing of the bus lines. The choice depends 
on the speed that one wants to obtain: total par- 
allelism allows the maximum speed compatible 
with the components, but requires a number of 
lines equal to the maximum number of SU avail- 
able on the CMS. On the other hand, in total 
sharing the bus is reduced to a single line, but the 
cycle is proportionally lengthened to the number of 
SU addressed on the CMS. 

The LCU allow the CMS to be "extended" 
beyond the physical availability of the hardware, 
using the same hardware in successive cycles. 

The SU are the basic elaborative elements of 
machine 2, and therefore of network 1. As was 
already mentioned, these may be configured with 
any number n, of input ports and n 0 of output ports. 
Basically their function is to associate the appro- 
priate configurations of the n Q output signals to 
each configuration of the n f input signals. These 



can operate as transcodification matrices and 
therefore may be carried out with normal memory 
devices. 

In the learning phase, such memories are filled 

5 according to the previously described procedures. 
In the data processing phase (the equivalent of the 
program execution phase in a Von Neumann ma- 
chine) they are used as look-up tables: input data 
are interpreted as the address in output data ta- 

w bles. Therefore, whatever the complexity of the 
transfer function realized, tabulated de facto in the 
SU, the output requested is produced in the time 
taken by a single address cycle. Since all SU of 
the same layer operate in parallel and indepen- 

15 dently, the execution time for an entire process 
consists of as many elementary cycles as there are 
layers in the configuration of the machine. All the 
same, it should be noted that each layer operates 
on the data output by the preceeding layer 

20 (therefore it has a fort of pipe-line structure). The 
average effective throughput of an entire process 
therefore is of a single cycle, independent of the 
amount of data input, the number of layers and the 
complexity of the processing, and therefore in- 

25 dependent of the number of operations to put to 
use to realize it on a Von Neumann machine. 

The simplest SU structure is that of a conven- 
tional memory, organized as a table: the input data 
are used to address, in a single cycle, the output 

30 data (Fig. 8). 

To use such a structure, it can be foreseen 
that, in the learning and back-learning phases, the 
functions of code generation and substitution of 
equivalent codes may be carried out with a specific 

35 program, resident in the LCU, abie to be realized 
with a conventional microprocessor. 

The SU however, may be carried out to advan- 
tage with Content Addressable Memory (CAM), es- 
pecially in cases in which it is not necessary to 

40 memorize the output corresponding to all the pos- 
sible input configurations (approximate calculations, 
signal recognition and processing problems, in 
which the intrinsic redundancy is always very 
high). If there is a desire to accelerate to the 

45 maximum the learning and back-learning phases, it 
may be convenient to configure the SU as in- 
dicated in the example in Fig. 10: besides the 
CAM, SU include an empty code detector ECD, a 
code generator CG and a code substitutor CS. 

50 Alternatively, it may also be convenient to im- 
plement in the SU a memory storing coefficients of 
suitable polynomials. 

Through the command Mode Selection (MS), 
the LCU communicates the operating mode to the 

55 SU: 

In the first phase of statistical learning - 
(calculation of the frequency of the /mple in input) 
the CG, which may be a standard incremental 
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counter, in conjunction with ECD, increments by 1 
the content of the output section of the CAM that 
corresponds to the configuration of the inputs veri- 
fied. In this way, at the end of the phase each 
location of the CAM will contain the number of 
occurrences of the configurations of the corre- 
sponding inputs. The set of such values allows the 
LCU to decide which configurations effectively 
must be codified in the CAM, according to proce- 
dures of the type previously described. 

In the second phase of statistical and de- 
terministic learning the codes to associate to the 
input configurations are sequentially generated by 
CG. 

Finally, in the back-learning phase CS pro- 
ceeds to substitute equivalent codes, received from 
LCU on the CTS line, by the previously described 
procedure. 

It is possible, lastly, to preventively provide the 
SU with a device of distance measure, and of 
codification based on this, according to procedures 
of the type previously explained. 

Since CAM are standard devices (available on 
the market as components) and ECD, CG and CS 
simple devices capable of being realized in many 
ways with conventional projection techniques (if 
necessary even with elementary microprocessors), 
it is not necessary here to further develop a possi- 
ble implementation. 

The LCU is predisposed to manage an entire 
layer in all the machine's functions and configura- 
tions. Moreover, given the hardware resources ac- 
tually available in the layer (number and type of 
SU) and the necessities of processing, these pro- 
vide - when necessary - for their use in any num- 
ber of cycles. If, for example, the layer has n SU 
and the problem requires 3n SU, LCU, by means 
of an appropriate program, breaks the problem into 
three parts, each corresponding to n SU. These, 
then, are processed in succession, on the n SU 
physically available. 

It may therefore be convenient to realize the 
LCU with a standard microprocessor, in which the 
following programs may be stored in memory: con- 
figuration of the layer, with management of the 
CMS; management of learning and back-learning; 
monitoring and synchonization of the transfers from 
the previous layer and to the next layer, etc. 

Finally, LCU communicates with the SCU, 
through the BUS system bus, in such a way as to 
give the SCU coordination and control of the whole 
machine. 

It is also possible to realize the LCU in hard- 
ware, with standard components, which allows 
greater speed in the configuration and learning 
phases. All the same, the microprocessor solution 
generally may be accepted in that the processing 
times of the execution phase don't depend on the 



realization of the LCU: it is in this phase that great 
speed normally is necessary, while for learning, 
which corresponds to the programmation phase of 
a Von Neumann machine, longer times may be 
5 acceptable. The principal assignments of the SCU 
are: 

• the configuration of machine 2, or the definition 
of the number of layers, of the connectivity and 
therefore of the single layers (actuation is by the 

70 LCUs); 

• management of the learning processes, as pre- 
viously described; 

• management of the standard peripherals 1 (discs, 
printers, CRT, etc.); 

75 • system monitoring (functioning, diagnostic tests, 
etc.); 

• loading of library configurations in the SU, for the 
execution of processing for which learning already 
has been done; 

20 • management of the process B periferais, except 
those with a high data-rate, which are put on the 
input and output of the network constituting ma- 
chine 2. 

All these functions may be advantageously re- 

25 alized on a conventional computer by developing 
specific programs, that furthermore dont require 
particular insights or techniques. The computer of 
the SCU also may be used to train machine 2 to 
resolve those problems for which there is an al- 

30 gorithm that provides a solution. In that case it is 
sufficient to develop a program that generates the 
possible input data to the SU and calculates for 
them, with the known algorithm, the relative out- 
puts, memorizing all in the SU memories. This is 

35 one of the types of learning. In this case as well, in 
the execution phase machine 2 requires only one 
cycle per layer, independently of the number of 
operations foreseen in the resolving algorithm. The 
calculation time on machine 2 therefore may be 

40 less than that of a conventional machine, even one 
of many orders of magnitude. 

As indicated, the SCU may be used to transfer 
to mass memory (discs, optical discs, tapes, etc.) 
the machine 2 configuration data as well as the 

45 contents of all the SU. In this way it is possible to 
build a library of processes already implemented 
and stored in mass memory that allows the com- 
plete configuration for the execution of a process to 
be reloaded on the machine, in this way reaching 

so the same level of flexibility that libraries of pro- 
grams give conventional machines. 

Finally, it may be convenient to combine the 
functions of the LCU with those of the SCU: in that 
case the LCU hardware is substituted by programs 

55 executed in the SCU that must be interfaced di- 
rectly with the SU and the CSM of all layers. 
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EXAMPLES 



Applications of the proposed invention will now 
be described in reference to three specific, non- 
iimiting examples. In each case, the input descrip- 
tion, the topology of the network, the structure of 
the SU and related details are only specified for the 
sake of illustration and could be modified in accor- 
dance with the requirements of the processing 
problem at hand. 



Image recognition (typed and handwritten nu- 
merical characters) 

The characters, digitalized in a binary matrix of 
pixel elements (e.g. a 16x32 matrix), are sent to 
the first layer of a network of five two-dimensional 
layers of the type shown in Fig. 2, with each SU of 
the same layer defining the same function. The bit- 
maps may be encoded through 32 integer values, 
each of which encodes a 4 x 4 portion of bit-map. 
At each level two adjacent codes are substituted by 
an appropriate output code, thereby halving the 
length of the description at each step. The last 
level has as input two codes. A code that repre- 
sents the class of characters input to the network is 
assigned to these in output. During the learning 
phase, carried out with the previously described 
direct stastical method, the network is trained, re- 
quiring each character of the LS to correspond to 
the output code of that class of characters. At the 
end of that phase the system is able to recognize 
exactly all the configurations in the LS. After the 
statistical back-learning procedure, one passes to 
the recognition phase. 



Speech recognition 

The S-Net may be composed of 64 two-input, 
simple output SU on six ievels. Each phonetic 
signal can be described by its spectrum, obtained 
for example by applying a Fast Fourier Transform. 
This spectrum may be processed by subdivision 
into 352-sample time windows, corresponding at 
about 23,5 ms to a sampling frequency of 15 KHz. 
The pace of advancement from one window to the 
next varies according to the normalization used. 
For each of these windows, the first eight cepstrai 
mel scaled coefficients can be calculated. Through 
a Vector Quantization (VQ) these data can be com- 
pressed into a VQ-Codebook of 256 distinct codes. 
The network may thus be trained by having the 
phonem identification code correspond to the spec- 
tra of known phonems (LS). After the statistical 
back-learning procedure, the network is then able 



to associate phonem codes to the spectra of un- 
known signals (recognition phase). 

5 Edge detection from monochrome images 

The S-Net may consist of three levels of SU 
with two inputs and one output. An image sampling 
at 512 x 512 pixels with 256 grey levels (1 byte) 

10 per pixel may be executed. The images input to 
the network may then be represented by a 512 x 
512 byte matrix. Such images are then sampled, 
for example, by using 3 bit per pixel codification 
techniques, taking into consideration f Y' form out- 

15 lines (two pixels above at the sides and one pixel 
below). Each bit is then set equal to 1 if one of the 
nearby pixels has value greater than the central 
pixel. The direct learning deterministic algorithm is 
then applied as previously described. For the back- 

20 learning procedure the statistical algorithm may 
instead be used. 



Claims 

25 

1. Data processing network with interconnective 
nodes (SU), including nodes ordered in many hier- 
archic layers, in which each node (SU) has its 
respective transfer function, said transfer function 

30 being selectively determinable in differentiated 
mode for diverse processing functions. 

2. Machine to realize a data processing net- 
work in conformance with Claim 1, characterized 
by its including: 

35 • multiple units (SU) that function as nodes, or- 
ganized in several hierarchic layers, 

• for each layer, a connection management system 
(CMS) able to make connections between the 
nodes (SU) of its respective layer and the nodes 

40 (SU) of at least one adjacent layer in the hierarchic 
structure, 

• a system control unit (SCU) and 

• at least one connection system (BUS) that acts 
as agent between said system control unit (SCU) 

45 and all machine layers able to realize at least one 
learning strategy in which: 

a data learning set (LS) is sent to the input of the 
bottom hierarchic layer of the machine while at the 
output of the top layer at least one desired output 
50 is imposed; the output of each layer is designed to 
build the input data set of the layer immediately 
above it in the machine's hierarchic structure. 

3. Machine in conformance with Claim 2, char- 
acterized by inclusion of configuration pieces (L- 

55 BUS) able to structure the network nodes (SU) with 
a number of inputs and outputs selectively deter- 
mined to meet the data processing needs. 

4. Machine in conformance with Claim 2 or 
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Claim 3, characterized by the inclusion, as alter- 
natives or in combination, of: 

• memory units to store a predetermined network 
configuration data set, at the end of a learning 
phase, and 

• units (A) to read, from the outside, data configu- 
rations between the network and an outside source. 

5. Machine in conformance with any of the 
Claims from 1 to 4, wherein at least some of said 
nodes (SU) realized in the form of a look-up table 
or, alternatively, as a memory storing coefficients 
of suitable polynomials. 

6. Machine in conformance with Claim 2, 
wherein said Connection Management System 
(CMS) is basically made up of a communication 
network. 

7. Learning procedure to build a network in 
conformance with Claim 1, characterized by its 
containing the following operations: 

• to bring to the network input a set (LS) of 
statistically significant configurations of input data, 
and 

• to impose at the network output (OUT) configura- 
tions corresponding to the desired data processing 
result. 

8. Procedure in conformance with Claim 7, 
characterized by the inclusion of operations that: 

• generate, for each data configuration brought to 
the input layer of network (1), an input-output asso- 
ciation code, and 

• propagate that phase to the top of network (1). 

9. Procedure in conformance with Claim 8, 
characterized by also including an operation to 
substitute those codes whose association always 
produces the same exit codes, thereby identifying 
equivalent codes and proceeding to eliminate re- 
dundant codes. 

10. Procedure for realizing a network (1) in 
conformance with Claim 1, characterized by inclu- 
sion of the following operations: 

• bringing to the input layer of network (1) a 
learning set (LS) of data representative of the uni- 
verse of data processing problems under examina- 
tion; 

• to codify said learning set (LS) at the lowest 
hierarchic layer of network (1) so as to generate 
respective outputs destined to constitute the inputs 
to the hierarchic layer immediately above, 

• to propagate said codification action ordinarily 
proceeding toward the hierarchic layers above net- 
work (1) up to the penultimate layer of network (1) 
itself, 

• to associate, at the top hierarchic layer in net- 
work (1) a code consisting of the desired solution; 
the unassigned codes in the above-mentioned 
phases correspond to unclassifiable configurations. 

11. Procedure in conformance with Claim 10, 
characterized by having the learning set (LS) 



formed from a frequency statistic of the possible 
occurences of input signals foreseen for network 
(1), incorporating in said set (LS) only the input 
data the global frequency for which surpasses a 
5 predetermined threshold 

12. Procedure in conformance with Claim 11, 
characterized by learning data the global frequency 
of which does not surpass said predetermined 
threshold. Instead, the same code associated to 

10 data included in said (LS) are associated to them 
and presented a distance that does not go below a 
given threshold 

13. Learning procedure for any of Claims 10- 
12, characterized, moreover, by inclusion of oper- 

rs ations that: 

• determine whether groups of input codes on a 
node (SU) produce the same codes at the output of 
the node (SU) itself so as to be equivalent for the 
purposes of data processing, and 

20 • identify (unify) such equivalent codes. 

14. Procedure in conformance with Claim 13, 
wherein the determination of equivalent codes and 
the unification of equivalent codes are carried out 
gradually starting from the highest hierarchic layers 

25 of network (1), the determination of equivalence 
being made by identifying structures and prop- 
erties internal to the data. 

15. Procedure in conformance with Claim 13 or 
Claim 14, characterized by its inclusion of the 

30 unification operations between the codes that, even 
though not being deterministically equivalent, are 
shown to be equivalent with a frequency above a 
predetermined threshold. 

16. Procedure in conformance with Claim 13 or 
35 Claim 14, characterized by its inclusion of a unifi- 
cation operation between codes that, even though 
not deterministically equivalent, correspond to data 
with a distance below a predetermined threshold in 
the sphere of a respective metric. 

40 17. Use of a network in conformance with 

Claim 1 and/or of a machine in conformance with 
any claim from 2 to 6 for signal processing, par- 
ticularly for an application chosen from the follow- 
ing group: configuration analysis, configuration rec- 

45 ognition and configuration understanding. 

18. Set of networks in conformance with Claim 
1 or of machines in conformance with any of the 
claims from 2 to 6 in the form of a library. 

50 



55 



10 



EP 0 393 571 A1 




EP 0 393 571 A1 



XTt>ATA OPT 



z H6 4 




1L 



7 



T I i 




LAYER 1 



kzi 




BUS 



Pie. 8 Re- 9 

oar ty*^ 



A 



/A/ 2 



/A/"/ 



FIG. 10 



ii / oar pot 

MS 




cAn VNPUT 



J 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



Application Number^^ 

EP 90 10 7235 



DOCUMENTS CONSIDERED TO BE RELEVANT 




Category 


Citation of document with indication, where appropriate, 
of relevant passages 


Relevant 
to claim 


CLASSIFICATION OF THE 
APPLICATION ant. CI.5) 


X 


IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, 

All Lift! *H PS 01 *■» A <f 1 1 A ^ *S *mm 

AND SIGNAL PROCESSING, vol. 36, no. 7, 
July 1988, pages 1180-1190, IEEE, New 
York, US; J.J. VIDAL: "Implementing 
neural nets with programmable logic" 
* Page 1180, column 2, lines 4-18; page 
1182, column 1, lines 19-47; column 2, 
lines 1-4,13-29; page 1183, column 1, 
lines 26-30; page 1184, column 1, lines 
10-53; column 2, lines 8-34; page 1185, 
column 1, lines 1-16; figures 2,3,4; 
page 1187, column 1, lines 28-40 * 


1,7 


G 06 F 15/80 
G 06 F 15/18 


A 


IDEM 


2,10 




X 


EP-A-0 195 569 (XEROX) 
* Page 1; page 2, lines 1-11; page 4, 
lines 3-17; page 9, lines 8-28; page 
10, lines 32-36; page 11, lines 1-5; 
figures 1,8 * 


1,7 




A 


PROCEEDINGS, FIRST INTERNATIONAL 


1,2,4,5 


TECHNICAL FIELDS 
SEARCHED ant. CI.S) 




CONFERENCE ON SUPERCOMPUTING SYSTEMS, 
St. Petersburg, Florida, 16th-20th 
December 1985, pages 641-649, IEEE, New 
York, US; S. KUMAR et al . : "A 
multi-level associative processor 
architecture for parallel processing" 
* Page 641, column 1, lines 1-7; page 
642; figure 1 * 




G 06 F 15/80 
G 06 F 15/18 
G 11 C 11/54 
G 11 C 15/00 


A 


US-A-4 069 473 (VITALIEV) 
* Column 3, lines 65-68; column 4, 
lines 1-42; column 5, lines 49-60; 
figures 1,3,4 * 


1,2,4,5 




The present search report has been drawn up for all claims 







Place of Kirch 

THE HAGUE 



Date of completion oft* 

12-07-1989 



DHEERE R.F.B.M. 



CATEGORY OF CITED DOCUMENTS 

X : particularly relevant If taken alone 

Y : particularly relevant if combined with another 

document of the same category 
A : technological background 
O : non-written disclosure 
P : intermediate document 



T : theory or principle underlying the invention 
E : earlier patent document, but published on, or 

after the filing date 
D : document cited in the application 
L : document cited for other reasons 

& : member of the same patent family, corresponding 
document 



3 



European Patent 
Office 



EUROPEAN SEARCH REPORT 



Page 2 

Application Number 

EP 90 10 7235 



DOCUMENTS CONSIDERED TO BE RELEVAN 



Category 



Citation of document with indication, where appropriate, 
of relevant passages 



Relevant 
to claim 



CLASSIFICATION OF THE 
APPLICATION a»t. CI. 5) 



IEEE FIRST INTERNATIONAL CONFERENCE ON 
NEURAL NETWORKS, San Diego, CA, 
21st-24th June 1987, pages III-411 - 
III-418, IEEE, New York, US; L.T. CLARK 
et al.: "Comparison of a pipelined 
"Best match" content addressable memory 
with neural networks" 
* Page 411, lines 13-29, page 412, 
figure 1; page 414, figure 2; page 415, 
lines 22-27 * 



I, 2,4,5 
,7,10, 

II, 12, 
15,16 



TECHNICAL FIELDS 
SEARCHED ant. C1.5) 



The present search report has been drawn up for all claims 



Place of search 

THE HAGUE 



Date of coapletiM of the Kirch 

12-07-1989 



DHEERE R.F.B.M. 



s 

a 
o 



CATEGORY OF CITED DOCUMENTS 

X : particularly relevant if taken alone 

Y : particularly relevant if combined with another 

document of the same category 
A : technological background 
O : non-written disclosure 
P : Intermediate document 



T : theory or principle underlying the invention 
E : earlier patent document, but published on, or 

after the filing date 
D : document cited ia the application 
L : document cited for other reasons 



& : member of the same patent family, corresponding 
document 



